Development of a Hindi Lemmatizer
نویسندگان
چکیده
We live in a translingual society, in order to communicate with people from different parts of the world we need to have an expertise in their respective languages. Learning all these languages is not at all possible; therefore we need a mechanism which can do this task for us. Machine translators have emerged as a tool which can perform this task. In order to develop a machine translator we need to develop several different rules. The very first module that comes in machine translation pipeline is morphological analysis. Stemming and lemmatization comes under morphological analysis. In this paper we have created a lemmatizer which generates rules for removing the affixes along with the addition of rules for creating a proper root word. Keywords— lemmatizer, lemmatization, inflectional, derivational.
منابع مشابه
IndiLem@FIRE-MET-2014 : An Unsupervised Lemmatizer for Indian Languages
An unsupervised and language independent lemmatization procedure has been developed for major Indian languages (Bengali, Hindi etc) which are morphologically very rich and agglutinative in nature. The task of a lemmatizer is mapping an inflected surface word to its appropriate dictionary root word and it is a pre-requisite for implementing several NLP tools like Word Sense Disambiguation system...
متن کاملDesign of a Rule Based Hindi Lemmatizer
Stemming is the process of clipping off the affixes from the input word to obtain the respective root word, but it is not necessary that stemming provide us the genuine and meaningful root word. To overcome this problem we come up with a solutionLemmatizer. It is the process by which we crave out the lemma from the given word and can also add additional rules to make the clipped word a proper s...
متن کاملUsing a Lemmatizer to Support the Development and Validation of the Greek WordNet
In this paper we aim to give a description of the computational tools that have been designed and implemented to support the development and validation process of the Greek WordNet, which is currently being developed in the framework of the BalkaNet project. In particular, we focus on the description of a lemmatizer for the Greek language, which has been used as the basis for a number of tools ...
متن کاملتحلیل روش دایره هندی در تعیین جهت قبله مساجد (نمونهموردی: مسجد جامع اصفهان)
Kaaba known as Qibla and the focus point of Muslim people is of paramount importance. Many scientists in the field of mathematics, astronomy and geography throughout the Islamic world tried to find the exact methods and procedures to determine the direction of Qibla. One of these methods is Hindi Circle used to determine Qibla of mosques. Most often the scientists, scholars and astronomers gath...
متن کاملA Self-Learning Context-Aware Lemmatizer for German
Accurate lemmatization of German nouns mandates the use of a lexicon. Comprehensive lexicons, however, are expensive to build and maintain. We present a selflearning lemmatizer capable of automatically creating a full-form lexicon by processing German documents.
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید
ثبت ناماگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید
ورودعنوان ژورنال:
- CoRR
دوره abs/1305.6211 شماره
صفحات -
تاریخ انتشار 2013